Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 5530 |
| Missing cells | 5933 |
| Missing cells (%) | 7.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 2.3 MiB |
| Average record size in memory | 429.0 B |
Variable types
| NUM | 9 |
|---|---|
| CAT | 6 |
Reproduction
| Analysis started | 2022-03-03 16:50:43.132522 |
|---|---|
| Analysis finished | 2022-03-03 16:51:05.089354 |
| Duration | 21.96 seconds |
| Version | pandas-profiling v2.7.1 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
CUST_ID has a high cardinality: 5530 distinct values | High cardinality |
CASH_ADVANCE has a high cardinality: 2609 distinct values | High cardinality |
PURCHASES_TRX has a high cardinality: 80 distinct values | High cardinality |
MINIMUM_PAYMENTS has a high cardinality: 5441 distinct values | High cardinality |
GENDER has 2714 (49.1%) missing values | Missing |
CASH_ADVANCE_TRX has 150 (2.7%) missing values | Missing |
ONEOFF_PURCHASES_FREQUENCY has 2740 (49.5%) missing values | Missing |
CASH_ADVANCE_FREQUENCY has 166 (3.0%) missing values | Missing |
TENURE has 163 (2.9%) missing values | Missing |
CUST_ID is uniformly distributed | Uniform |
CUST_ID has unique values | Unique |
PAYMENTS has unique values | Unique |
PURCHASES has 1393 (25.2%) zeros | Zeros |
CASH_ADVANCE_TRX has 2812 (50.8%) zeros | Zeros |
PURCHASES_FREQUENCY has 1392 (25.2%) zeros | Zeros |
ONEOFF_PURCHASES_FREQUENCY has 1464 (26.5%) zeros | Zeros |
CASH_ADVANCE_FREQUENCY has 2801 (50.7%) zeros | Zeros |
| Distinct count | 5530 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 43.3 KiB |
| C12529 | 1 |
|---|---|
| C14163 | 1 |
| C13265 | 1 |
| C16553 | 1 |
| C16098 | 1 |
| Other values (5525) |
| Value | Count | Frequency (%) | |
| C12529 | 1 | < 0.1% | |
| C14163 | 1 | < 0.1% | |
| C13265 | 1 | < 0.1% | |
| C16553 | 1 | < 0.1% | |
| C16098 | 1 | < 0.1% | |
| C17438 | 1 | < 0.1% | |
| C17125 | 1 | < 0.1% | |
| C11238 | 1 | < 0.1% | |
| C18933 | 1 | < 0.1% | |
| C12713 | 1 | < 0.1% | |
| Other values (5520) | 5520 | 99.8% |
Length
| Max length | 6 |
|---|---|
| Mean length | 6 |
| Min length | 6 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 10 | 90.9% | |
| Uppercase_Letter | 1 | 9.1% |
| Value | Count | Frequency (%) | |
| Common | 10 | 90.9% | |
| Latin | 1 | 9.1% |
| Value | Count | Frequency (%) | |
| ASCII | 11 | 100.0% |
| Distinct count | 2 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 2714 |
| Missing (%) | 49.1% |
| Memory size | 43.3 KiB |
| F | |
|---|---|
| M |
| Value | Count | Frequency (%) | |
| F | 1443 | 26.1% | |
| M | 1373 | 24.8% | |
| (Missing) | 2714 | 49.1% |
Length
| Max length | 3 |
|---|---|
| Mean length | 1.981555154 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Uppercase_Letter | 2 | 50.0% | |
| Lowercase_Letter | 2 | 50.0% |
| Value | Count | Frequency (%) | |
| Latin | 4 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 4 | 100.0% |
BALANCE
Real number (ℝ)
| Distinct count | 5525 |
|---|---|
| Unique (%) | 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1041.700462707233 |
|---|---|
| Minimum | -4587.892398 |
| Maximum | 7390.19856 |
| Zeros | 6 |
| Zeros (%) | 0.1% |
| Memory size | 43.3 KiB |
Quantile statistics
| Minimum | -4587.892398 |
|---|---|
| 5-th percentile | 3.88240525 |
| Q1 | 74.060304 |
| median | 632.7436345 |
| Q3 | 1545.808455 |
| 95-th percentile | 3869.371332 |
| Maximum | 7390.19856 |
| Range | 11978.09096 |
| Interquartile range (IQR) | 1471.748151 |
Descriptive statistics
| Standard deviation | 1353.093044 |
|---|---|
| Coefficient of variation (CV) | 1.29892718 |
| Kurtosis | 3.290218207 |
| Mean | 1041.700463 |
| Median Absolute Deviation (MAD) | 594.745598 |
| Skewness | 1.475458824 |
| Sum | 5760603.559 |
| Variance | 1830860.785 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 6 | 0.1% | |
| 107.944741 | 1 | < 0.1% | |
| 109.621031 | 1 | < 0.1% | |
| 952.51575 | 1 | < 0.1% | |
| 559.151424 | 1 | < 0.1% | |
| 3356.816523 | 1 | < 0.1% | |
| 4117.751094 | 1 | < 0.1% | |
| 1679.952713 | 1 | < 0.1% | |
| 1179.746682 | 1 | < 0.1% | |
| 37.307085 | 1 | < 0.1% | |
| Other values (5515) | 5515 | 99.7% |
| Value | Count | Frequency (%) | |
| -4587.892398 | 1 | < 0.1% | |
| -4530.639094 | 1 | < 0.1% | |
| -4251.411617 | 1 | < 0.1% | |
| -4071.993764 | 1 | < 0.1% | |
| -3948.776884 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 7390.19856 | 1 | < 0.1% | |
| 7347.355967 | 1 | < 0.1% | |
| 7293.108794 | 1 | < 0.1% | |
| 7215.745096 | 1 | < 0.1% | |
| 7152.864372 | 1 | < 0.1% |
| Distinct count | 3682 |
|---|---|
| Unique (%) | 66.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 534.5771030741411 |
|---|---|
| Minimum | 0.0 |
| Maximum | 9661.37 |
| Zeros | 1393 |
| Zeros (%) | 25.2% |
| Memory size | 43.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 269.13 |
| Q3 | 723.7 |
| 95-th percentile | 1975.906 |
| Maximum | 9661.37 |
| Range | 9661.37 |
| Interquartile range (IQR) | 723.7 |
Descriptive statistics
| Standard deviation | 773.4887449 |
|---|---|
| Coefficient of variation (CV) | 1.446917087 |
| Kurtosis | 18.5878817 |
| Mean | 534.5771031 |
| Median Absolute Deviation (MAD) | 269.13 |
| Skewness | 3.268794177 |
| Sum | 2956211.38 |
| Variance | 598284.8385 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 1393 | 25.2% | |
| 45.65 | 21 | 0.4% | |
| 150 | 14 | 0.3% | |
| 60 | 12 | 0.2% | |
| 100 | 10 | 0.2% | |
| 450 | 10 | 0.2% | |
| 600 | 9 | 0.2% | |
| 50 | 9 | 0.2% | |
| 250 | 9 | 0.2% | |
| 120 | 9 | 0.2% | |
| Other values (3672) | 4034 | 72.9% |
| Value | Count | Frequency (%) | |
| 0 | 1393 | 25.2% | |
| 0.01 | 3 | 0.1% | |
| 0.05 | 1 | < 0.1% | |
| 0.24 | 1 | < 0.1% | |
| 1 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 9661.37 | 1 | < 0.1% | |
| 8945.67 | 1 | < 0.1% | |
| 8834.96 | 1 | < 0.1% | |
| 8591.31 | 1 | < 0.1% | |
| 7311.99 | 1 | < 0.1% |
BALANCE_FREQUENCY
Real number (ℝ≥0)
| Distinct count | 58 |
|---|---|
| Unique (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 26.48255227016275 |
|---|---|
| Minimum | 0.0 |
| Maximum | 1000.0 |
| Zeros | 6 |
| Zeros (%) | 0.1% |
| Memory size | 43.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.363636 |
| Q1 | 0.833333 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 1000 |
| Range | 1000 |
| Interquartile range (IQR) | 0.166667 |
Descriptive statistics
| Standard deviation | 152.899316 |
|---|---|
| Coefficient of variation (CV) | 5.773586866 |
| Kurtosis | 34.06665053 |
| Mean | 26.48255227 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.96293972 |
| Sum | 146448.5141 |
| Variance | 23378.20083 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 3554 | 64.3% | |
| 0.909091 | 275 | 5.0% | |
| 0.818182 | 188 | 3.4% | |
| 0.545455 | 158 | 2.9% | |
| 0.636364 | 147 | 2.7% | |
| 0.727273 | 145 | 2.6% | |
| 0.454545 | 135 | 2.4% | |
| 0.363636 | 125 | 2.3% | |
| 1000 | 111 | 2.0% | |
| 0.272727 | 110 | 2.0% | |
| Other values (48) | 582 | 10.5% |
| Value | Count | Frequency (%) | |
| 0 | 6 | 0.1% | |
| 0.090909 | 23 | 0.4% | |
| 0.1 | 1 | < 0.1% | |
| 0.125 | 2 | < 0.1% | |
| 0.142857 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1000 | 111 | 2.0% | |
| 909.091 | 9 | 0.2% | |
| 888.889 | 1 | < 0.1% | |
| 857.143 | 2 | < 0.1% | |
| 833.333 | 1 | < 0.1% |
| Distinct count | 2609 |
|---|---|
| Unique (%) | 47.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 43.3 KiB |
| 0.0 | |
|---|---|
| ?? | 75 |
| 0.0?ñ | 41 |
| 472.818286 | 1 |
| 2436.195048 | 1 |
| Other values (2604) |
| Value | Count | Frequency (%) | |
| 0.0 | 2808 | 50.8% | |
| ?? | 75 | 1.4% | |
| 0.0?ñ | 41 | 0.7% | |
| 472.818286 | 1 | < 0.1% | |
| 2436.195048 | 1 | < 0.1% | |
| 1831.115496 | 1 | < 0.1% | |
| 1957.772343 | 1 | < 0.1% | |
| 1288.83283 | 1 | < 0.1% | |
| 188.234434 | 1 | < 0.1% | |
| 2908.400137 | 1 | < 0.1% | |
| Other values (2599) | 2599 | 47.0% |
Length
| Max length | 13 |
|---|---|
| Mean length | 6.456057866 |
| Min length | 2 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 10 | 76.9% | |
| Other_Punctuation | 2 | 15.4% | |
| Lowercase_Letter | 1 | 7.7% |
| Value | Count | Frequency (%) | |
| Common | 12 | 92.3% | |
| Latin | 1 | 7.7% |
| Value | Count | Frequency (%) | |
| ASCII | 12 | 100.0% |
| Distinct count | 34 |
|---|---|
| Unique (%) | 0.6% |
| Missing | 150 |
| Missing (%) | 2.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 49.11542750929368 |
|---|---|
| Minimum | 0.0 |
| Maximum | 18000.0 |
| Zeros | 2812 |
| Zeros (%) | 50.8% |
| Memory size | 43.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 3 |
| 95-th percentile | 12 |
| Maximum | 18000 |
| Range | 18000 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 573.8177709 |
|---|---|
| Coefficient of variation (CV) | 11.68304543 |
| Kurtosis | 469.4166907 |
| Mean | 49.11542751 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 19.33841254 |
| Sum | 264241 |
| Variance | 329266.8342 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 2812 | 50.8% | |
| 1 | 562 | 10.2% | |
| 2 | 393 | 7.1% | |
| 3 | 290 | 5.2% | |
| 4 | 234 | 4.2% | |
| 5 | 204 | 3.7% | |
| 6 | 159 | 2.9% | |
| 7 | 130 | 2.4% | |
| 8 | 105 | 1.9% | |
| 10 | 83 | 1.5% | |
| Other values (24) | 408 | 7.4% | |
| (Missing) | 150 | 2.7% |
| Value | Count | Frequency (%) | |
| 0 | 2812 | 50.8% | |
| 1 | 562 | 10.2% | |
| 2 | 393 | 7.1% | |
| 3 | 290 | 5.2% | |
| 4 | 234 | 4.2% |
| Value | Count | Frequency (%) | |
| 18000 | 1 | < 0.1% | |
| 17000 | 1 | < 0.1% | |
| 14000 | 1 | < 0.1% | |
| 12000 | 1 | < 0.1% | |
| 10000 | 1 | < 0.1% |
| Distinct count | 69 |
|---|---|
| Unique (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.206005977034355 |
|---|---|
| Minimum | 0.0 |
| Maximum | 1000.0 |
| Zeros | 1392 |
| Zeros (%) | 25.2% |
| Memory size | 43.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0.363636 |
| Q3 | 0.833333 |
| 95-th percentile | 1 |
| Maximum | 1000 |
| Range | 1000 |
| Interquartile range (IQR) | 0.833333 |
Descriptive statistics
| Standard deviation | 93.75767056 |
|---|---|
| Coefficient of variation (CV) | 7.681273525 |
| Kurtosis | 82.01112325 |
| Mean | 12.20600598 |
| Median Absolute Deviation (MAD) | 0.363636 |
| Skewness | 8.892601479 |
| Sum | 67499.21305 |
| Variance | 8790.500789 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 1392 | 25.2% | |
| 1 | 881 | 15.9% | |
| 0.083333 | 465 | 8.4% | |
| 0.5 | 277 | 5.0% | |
| 0.166667 | 274 | 5.0% | |
| 0.25 | 237 | 4.3% | |
| 0.333333 | 233 | 4.2% | |
| 0.833333 | 230 | 4.2% | |
| 0.416667 | 216 | 3.9% | |
| 0.666667 | 211 | 3.8% | |
| Other values (59) | 1114 | 20.1% |
| Value | Count | Frequency (%) | |
| 0 | 1392 | 25.2% | |
| 0.083333 | 465 | 8.4% | |
| 0.090909 | 35 | 0.6% | |
| 0.1 | 18 | 0.3% | |
| 0.111111 | 12 | 0.2% |
| Value | Count | Frequency (%) | |
| 1000 | 26 | 0.5% | |
| 916.667 | 6 | 0.1% | |
| 900 | 1 | < 0.1% | |
| 857.143 | 1 | < 0.1% | |
| 833.333 | 4 | 0.1% |
| Distinct count | 80 |
|---|---|
| Unique (%) | 1.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 43.3 KiB |
| 0 | |
|---|---|
| 1 | 460 |
| 12 | 387 |
| 2 | 252 |
| 6 | 243 |
| Other values (75) |
| Value | Count | Frequency (%) | |
| 0 | 1353 | 24.5% | |
| 1 | 460 | 8.3% | |
| 12 | 387 | 7.0% | |
| 2 | 252 | 4.6% | |
| 6 | 243 | 4.4% | |
| 4 | 203 | 3.7% | |
| 3 | 196 | 3.5% | |
| 5 | 190 | 3.4% | |
| 8 | 186 | 3.4% | |
| 7 | 184 | 3.3% | |
| Other values (70) | 1876 | 33.9% |
Length
| Max length | 7 |
|---|---|
| Mean length | 1.45045208 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 10 | 83.3% | |
| Other_Punctuation | 1 | 8.3% | |
| Lowercase_Letter | 1 | 8.3% |
| Value | Count | Frequency (%) | |
| Common | 11 | 91.7% | |
| Latin | 1 | 8.3% |
| Value | Count | Frequency (%) | |
| ASCII | 11 | 100.0% |
| Distinct count | 41 |
|---|---|
| Unique (%) | 1.5% |
| Missing | 2740 |
| Missing (%) | 49.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.14829775232974912 |
|---|---|
| Minimum | 0.0 |
| Maximum | 1.0 |
| Zeros | 1464 |
| Zeros (%) | 26.5% |
| Memory size | 43.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.166667 |
| 95-th percentile | 0.75 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.166667 |
Descriptive statistics
| Standard deviation | 0.241687055 |
|---|---|
| Coefficient of variation (CV) | 1.629741862 |
| Kurtosis | 3.442475174 |
| Mean | 0.1482977523 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.013350785 |
| Sum | 413.750729 |
| Variance | 0.05841263257 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 1464 | 26.5% | |
| 0.083333 | 376 | 6.8% | |
| 0.166667 | 214 | 3.9% | |
| 0.25 | 131 | 2.4% | |
| 0.333333 | 88 | 1.6% | |
| 0.416667 | 83 | 1.5% | |
| 1 | 64 | 1.2% | |
| 0.5 | 61 | 1.1% | |
| 0.583333 | 42 | 0.8% | |
| 0.666667 | 38 | 0.7% | |
| Other values (31) | 229 | 4.1% | |
| (Missing) | 2740 | 49.5% |
| Value | Count | Frequency (%) | |
| 0 | 1464 | 26.5% | |
| 0.083333 | 376 | 6.8% | |
| 0.090909 | 23 | 0.4% | |
| 0.1 | 13 | 0.2% | |
| 0.111111 | 11 | 0.2% |
| Value | Count | Frequency (%) | |
| 1 | 64 | 1.2% | |
| 0.916667 | 28 | 0.5% | |
| 0.909091 | 1 | < 0.1% | |
| 0.875 | 1 | < 0.1% | |
| 0.833333 | 21 | 0.4% |
| Distinct count | 46 |
|---|---|
| Unique (%) | 0.9% |
| Missing | 166 |
| Missing (%) | 3.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.11900540920954511 |
|---|---|
| Minimum | 0.0 |
| Maximum | 1.5 |
| Zeros | 2801 |
| Zeros (%) | 50.7% |
| Memory size | 43.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.166667 |
| 95-th percentile | 0.5 |
| Maximum | 1.5 |
| Range | 1.5 |
| Interquartile range (IQR) | 0.166667 |
Descriptive statistics
| Standard deviation | 0.1732062886 |
|---|---|
| Coefficient of variation (CV) | 1.455448872 |
| Kurtosis | 3.499384508 |
| Mean | 0.1190054092 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.786846819 |
| Sum | 638.345015 |
| Variance | 0.03000041842 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 2801 | 50.7% | |
| 0.083333 | 664 | 12.0% | |
| 0.166667 | 466 | 8.4% | |
| 0.25 | 360 | 6.5% | |
| 0.333333 | 258 | 4.7% | |
| 0.416667 | 155 | 2.8% | |
| 0.5 | 105 | 1.9% | |
| 0.583333 | 75 | 1.4% | |
| 0.666667 | 56 | 1.0% | |
| 0.090909 | 49 | 0.9% | |
| Other values (36) | 375 | 6.8% | |
| (Missing) | 166 | 3.0% |
| Value | Count | Frequency (%) | |
| 0 | 2801 | 50.7% | |
| 0.083333 | 664 | 12.0% | |
| 0.090909 | 49 | 0.9% | |
| 0.1 | 28 | 0.5% | |
| 0.111111 | 18 | 0.3% |
| Value | Count | Frequency (%) | |
| 1.5 | 1 | < 0.1% | |
| 1.166667 | 1 | < 0.1% | |
| 1 | 4 | 0.1% | |
| 0.916667 | 2 | < 0.1% | |
| 0.9 | 1 | < 0.1% |
CREDIT_LIMIT
Real number (ℝ≥0)
| Distinct count | 134 |
|---|---|
| Unique (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3588.0952563609403 |
|---|---|
| Minimum | 50.0 |
| Maximum | 12500.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 43.3 KiB |
Quantile statistics
| Minimum | 50 |
|---|---|
| 5-th percentile | 1000 |
| Q1 | 1500 |
| median | 2900 |
| Q3 | 5000 |
| 95-th percentile | 9000 |
| Maximum | 12500 |
| Range | 12450 |
| Interquartile range (IQR) | 3500 |
Descriptive statistics
| Standard deviation | 2640.396238 |
|---|---|
| Coefficient of variation (CV) | 0.7358768509 |
| Kurtosis | 0.5970263702 |
| Mean | 3588.095256 |
| Median Absolute Deviation (MAD) | 1500 |
| Skewness | 1.145162447 |
| Sum | 19842166.77 |
| Variance | 6971692.293 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 3000 | 563 | 10.2% | |
| 1500 | 542 | 9.8% | |
| 1200 | 457 | 8.3% | |
| 1000 | 454 | 8.2% | |
| 2500 | 426 | 7.7% | |
| 4000 | 317 | 5.7% | |
| 6000 | 281 | 5.1% | |
| 2000 | 280 | 5.1% | |
| 5000 | 225 | 4.1% | |
| 7000 | 147 | 2.7% | |
| Other values (124) | 1838 | 33.2% |
| Value | Count | Frequency (%) | |
| 50 | 1 | < 0.1% | |
| 150 | 4 | 0.1% | |
| 200 | 3 | 0.1% | |
| 300 | 12 | 0.2% | |
| 400 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 12500 | 12 | 0.2% | |
| 12000 | 31 | 0.6% | |
| 11500 | 25 | 0.5% | |
| 11000 | 25 | 0.5% | |
| 10750 | 1 | < 0.1% |
| Distinct count | 5530 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1107.9898173103074 |
|---|---|
| Minimum | 0.056466 |
| Maximum | 9933.62261 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 43.3 KiB |
Quantile statistics
| Minimum | 0.056466 |
|---|---|
| 5-th percentile | 124.9274707 |
| Q1 | 345.4311015 |
| median | 671.0016995 |
| Q3 | 1354.931507 |
| 95-th percentile | 3710.658747 |
| Maximum | 9933.62261 |
| Range | 9933.566144 |
| Interquartile range (IQR) | 1009.500406 |
Descriptive statistics
| Standard deviation | 1270.892564 |
|---|---|
| Coefficient of variation (CV) | 1.147025491 |
| Kurtosis | 9.951139009 |
| Mean | 1107.989817 |
| Median Absolute Deviation (MAD) | 399.8415645 |
| Skewness | 2.78151989 |
| Sum | 6127183.69 |
| Variance | 1615167.91 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 192.781455 | 1 | < 0.1% | |
| 628.237735 | 1 | < 0.1% | |
| 661.24329 | 1 | < 0.1% | |
| 876.63896 | 1 | < 0.1% | |
| 523.910288 | 1 | < 0.1% | |
| 264.032163 | 1 | < 0.1% | |
| 284.093261 | 1 | < 0.1% | |
| 1409.282903 | 1 | < 0.1% | |
| 890.174668 | 1 | < 0.1% | |
| 1072.433416 | 1 | < 0.1% | |
| Other values (5520) | 5520 | 99.8% |
| Value | Count | Frequency (%) | |
| 0.056466 | 1 | < 0.1% | |
| 3.500505 | 1 | < 0.1% | |
| 4.523555 | 1 | < 0.1% | |
| 4.841543 | 1 | < 0.1% | |
| 9.533313 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 9933.62261 | 1 | < 0.1% | |
| 9858.055448 | 1 | < 0.1% | |
| 9801.637331 | 1 | < 0.1% | |
| 9724.871142 | 1 | < 0.1% | |
| 9614.697558 | 1 | < 0.1% |
| Distinct count | 5441 |
|---|---|
| Unique (%) | 98.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 43.3 KiB |
| ?? | 89 |
|---|---|
| 299.351881 | 2 |
| 1311.061985 | 1 |
| 218.279194 | 1 |
| 596.541854 | 1 |
| Other values (5436) |
| Value | Count | Frequency (%) | |
| ?? | 89 | 1.6% | |
| 299.351881 | 2 | < 0.1% | |
| 1311.061985 | 1 | < 0.1% | |
| 218.279194 | 1 | < 0.1% | |
| 596.541854 | 1 | < 0.1% | |
| 982.488109 | 1 | < 0.1% | |
| 1315.479892 | 1 | < 0.1% | |
| 351.744608 | 1 | < 0.1% | |
| 92.369903?ñ | 1 | < 0.1% | |
| 233.788637?ñ | 1 | < 0.1% | |
| Other values (5431) | 5431 | 98.2% |
Length
| Max length | 13 |
|---|---|
| Mean length | 9.779385172 |
| Min length | 2 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 10 | 76.9% | |
| Other_Punctuation | 2 | 15.4% | |
| Lowercase_Letter | 1 | 7.7% |
| Value | Count | Frequency (%) | |
| Common | 12 | 92.3% | |
| Latin | 1 | 7.7% |
| Value | Count | Frequency (%) | |
| ASCII | 12 | 100.0% |
| Distinct count | 19 |
|---|---|
| Unique (%) | 0.4% |
| Missing | 163 |
| Missing (%) | 2.9% |
| Memory size | 43.3 KiB |
| 12 | |
|---|---|
| 11 | 224 |
| 10 | 149 |
| 6 | 135 |
| 7 | 125 |
| Other values (14) | 508 |
| Value | Count | Frequency (%) | |
| 12 | 4226 | 76.4% | |
| 11 | 224 | 4.1% | |
| 10 | 149 | 2.7% | |
| 6 | 135 | 2.4% | |
| 7 | 125 | 2.3% | |
| -12 | 124 | 2.2% | |
| 8 | 119 | 2.2% | |
| 9 | 108 | 2.0% | |
| ?? | 69 | 1.2% | |
| 12?ñ | 56 | 1.0% | |
| Other values (9) | 32 | 0.6% | |
| (Missing) | 163 | 2.9% |
Length
| Max length | 4 |
|---|---|
| Mean length | 1.988969259 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 7 | 58.3% | |
| Lowercase_Letter | 3 | 25.0% | |
| Dash_Punctuation | 1 | 8.3% | |
| Other_Punctuation | 1 | 8.3% |
| Value | Count | Frequency (%) | |
| Common | 9 | 75.0% | |
| Latin | 3 | 25.0% |
| Value | Count | Frequency (%) | |
| ASCII | 11 | 100.0% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| CUST_ID | GENDER | BALANCE | PURCHASES | BALANCE_FREQUENCY | CASH_ADVANCE | CASH_ADVANCE_TRX | PURCHASES_FREQUENCY | PURCHASES_TRX | ONEOFF_PURCHASES_FREQUENCY | CASH_ADVANCE_FREQUENCY | CREDIT_LIMIT | PAYMENTS | MINIMUM_PAYMENTS | TENURE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | C12529 | F | 107.944741 | 118.16 | 0.875000 | 472.818286 | 1.0 | 0.125000 | 2 | 0.125 | 0.125000 | 2500.0 | 192.781455 | 56.999671 | 8 |
| 1 | C14138 | NaN | 241.032979 | 0.00 | 1.000000 | 642.862505 | 1.0 | 0.000000 | 0 | NaN | 0.083333 | 1500.0 | 915.454305 | 195.162256 | 12 |
| 2 | C15409 | NaN | 894.357857 | 1164.00 | 1.000000 | 0.0 | 0.0 | 1.000000 | 12 | NaN | 0.000000 | 2000.0 | 907.603723 | 270.413449 | -12 |
| 3 | C18141 | F | -188.132508 | 515.88 | 1.000000 | 0.0 | NaN | 0.833333 | 14 | NaN | 0.000000 | 2700.0 | 601.729266 | 194.534934 | 12 |
| 4 | C15879 | NaN | 3881.679582 | 15.92 | 1.000000 | 2183.782456 | 9.0 | 0.083333 | 1 | NaN | 0.333333 | 5500.0 | 1032.183632 | 1129.747227 | 12 |
| 5 | C17660 | NaN | 1087.784698 | 0.00 | 1.000000 | 1562.703953 | 2.0 | 0.000000 | 0 | 0.000 | 0.166667 | 1500.0 | 3093.888643 | 298.011965 | 12 |
| 6 | C10916 | NaN | 1081.065726 | 554.85 | 1.000000 | 952.424906 | 8.0 | 0.500000 | 20 | 0.250 | 0.166667 | 2100.0 | 1898.828120 | 382.716751 | 12 |
| 7 | C15128 | NaN | 100.208311 | 0.00 | 0.909091 | 182.143966 | 1.0 | 0.000000 | 0 | NaN | 0.090909 | 3000.0 | 175.911508 | 145.244181 | 11 |
| 8 | C10109 | NaN | 862.072380 | 0.00 | 1.000000 | 920.309805 | 1.0 | 0.000000 | 0 | 0.000 | 0.083333 | 4000.0 | 2236.890255 | 214.828158 | 12 |
| 9 | C17983 | NaN | 1757.439933 | 0.00 | 0.833333 | 2408.007601 | 6.0 | 0.000000 | 0 | 0.000 | 0.166667 | 2500.0 | 175.115831 | 450.616731 | 6 |
Last rows
| CUST_ID | GENDER | BALANCE | PURCHASES | BALANCE_FREQUENCY | CASH_ADVANCE | CASH_ADVANCE_TRX | PURCHASES_FREQUENCY | PURCHASES_TRX | ONEOFF_PURCHASES_FREQUENCY | CASH_ADVANCE_FREQUENCY | CREDIT_LIMIT | PAYMENTS | MINIMUM_PAYMENTS | TENURE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5520 | C16104 | NaN | 2525.683344 | 0.00 | 1.000000 | 285.193204 | 5.0 | 0.000000 | 0 | NaN | 0.166667 | 7000.0 | 1483.384610 | 702.052491 | 12 |
| 5521 | C19019 | NaN | 634.514354 | 0.00 | 0.909091 | 1682.137421 | 12.0 | 0.000000 | 0 | 0.000000 | 0.636364 | 1500.0 | 2162.277429 | 257.081648 | 11 |
| 5522 | C18355 | NaN | 930.656420 | 300.05 | 1.000000 | 0.0 | 0.0 | 0.750000 | 9 | NaN | 0.000000 | 1200.0 | 513.064156 | 330.422815 | 12 |
| 5523 | C18766 | NaN | 21.168201 | 236.40 | 1.000000 | 0.0 | 0.0 | 1.000000 | 24 | 1.000000 | NaN | 2500.0 | 217.008342 | 178.169321 | 12 |
| 5524 | C16616 | NaN | 846.091011 | 2599.20 | 1.000000 | 0.0 | 0.0 | 0.916667 | 19 | 0.333333 | 0.000000 | 3000.0 | 1900.699307 | 195.516066 | 12 |
| 5525 | C10075 | NaN | 656.013010 | 0.00 | 1000.000000 | 1474.349901 | 3.0 | 0.000000 | 0 | 0.000000 | 0.125000 | 7000.0 | 910.457985 | 140.983193 | 8 |
| 5526 | C17321 | NaN | 15.232505 | 384.00 | 0.272727 | 0.0 | 0.0 | 1.000000 | 12?ñ | NaN | 0.000000 | 1500.0 | 568.982664 | 54.449416 | 12 |
| 5527 | C12909 | NaN | 1023.124791 | 1537.93 | 1.000000 | 247.04197 | 1.0 | 0.750000 | 25 | 0.583333 | 0.083333 | 9000.0 | 1070.149971 | 235.241959 | -12 |
| 5528 | C15615 | F | 957.010021 | 604.80 | 1.000000 | 901.754709 | 3.0 | 1.000000 | 12 | NaN | 0.083333 | 1000.0 | 811.457190 | 926.087148 | 12 |
| 5529 | C12391 | NaN | 2664.700424 | 715.51 | 1.000000 | 494.573662 | 1.0 | 750.000000 | 11 | 0.083333 | 0.083333 | 3500.0 | 918.003032 | 792.902894 | 12 |